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Abstract. In this paper we present a semi-automatic 2D-3D local registration pipeline capable of coloring 3D 
models obtained from 3D scanners by using uncalibrated images. The proposed pipeline exploits the Structure 
from Motion (SfM) technique in order to reconstruct a sparse representation of the 3D object and obtain the 
camera parameters from image feature matches. We then coarsely register the reconstructed 3D model to the 
scanned one through the Scale Iterative Closest Point (SICP) algorithm. SICP provides the global scale, rotation 
and translation parameters, using minimal manual user intervention. In the final processing stage, a local regis¬ 
tration refinement algorithm optimizes the color projection of the aligned photos on the 3D object removing the 
blurring /ghosting artefacts introduced due to small inaccuracies during the registration. The proposed pipeline 
is capable of handling real world cases with a range of characteristics from objects with low level geometric 
features to complex ones. 
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1 Introduction 

Digitization of cultural heritage objects has gained great 
attention around the world due to the importance and 
awareness of what they represent for each culture. Re¬ 
searchers have been trying to achieve the same goal: 
capturing a 3D digital representation together with its 
color information to be able to pass them down safely to 
future generations. 

The recovery and generation of the 3D digital repre¬ 
sentation requires high geometric accuracy, availability 
of all details and photo realism (TJ. Any single 3D imag¬ 
ing technique is unable to fulfil all of these requirements 
and the only way to solve this problem is through the 
fusion of multiple techniques. 

There have been a number of recent studies which 
have tried to map automatically, semi-automatically or 
manually a photorealistic appearance onto a 3D model. 
Some of these have used only photogrammetry I2l,l3l, 
which provides poor geometric precision. However for 
cultural heritage applications, especially for conserva¬ 
tion, a high density of the 3D point cloud is needed. In 
order to satisfy the demanding needs of cultural her¬ 
itage, the combination of both photogrammetry and 
range scans I4l5l6l have been considered. These ap¬ 
proaches generally start by computing an image-to- 
geometry registration, followed by an integration strat¬ 
egy. The first one generally seeks to find the calibration 
parameters of the set of images, while the second tries 
to select the best color for each of the images. 

There has been research focusing on improving the 
alignment in all the images 1 71819 1 (global registra¬ 
tion). However, the visual results show significant blur¬ 
ring and ghosting artefacts. Others have proved that a 


perfect global registration is not possible because the 
two geometries come from different devices and conse¬ 
quently the only solution available is to consider a local 
registration refinement I110I11I12I . 

This paper proposes a solution for a full end-to-end 
pipeline in order to process data from different acquisi¬ 
tion techniques to generate both a realistic and accurate 
visual representation of the object. Our solution recov¬ 
ers the 3D dimension from 2D images to align the 3D 
recovered object with a second more geometrically accu¬ 
rate scan. The input 2D images are enhanced to improve 
the feature detection by the Structure from Motion algo¬ 
rithm (SfM) which provides the position and orientation 
of each image together with a sparse 3D point cloud. 
The idea behind the 3D reconstruction is to perform the 
alignment in 3 dimensions through the Scale Iterative 
Closes Point (SICP) algorithm obtaining the transforma¬ 
tion parameters to be applied in the extrinsic ones of the 
cameras. Even though, the alignment is performed min¬ 
imizing the distance between both 3D models, it is ap¬ 
proximate for different reasons (sparseness, noise) and a 
local registration refinement is needed. In the last stage 
of our pipeline, color projection, an algorithm to correct 
the local color error displacement is performed. Our lo¬ 
cal correction algorithm works in an image space find¬ 
ing the correct matches for each point in the 3D model is 
deviated from image to image. 

2 Related Work 

The main related issues taken into account in our 
pipeline can be divided into 3 major fields: (1) 2D/3D 
registration, (2) color projection, and (3) registration re- 
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finement process. The important related work in these 
fields is outlined below. 

2.1 2D/3D Registration 

Image to Geometry registration consists of registering 
the images with the 3D model defining all the parame¬ 
ters of the virtual camera (intrinsic and extrinsic) whose 
position and orientation gives an optimal inverse pro¬ 
jection of the image onto the 3D model. 

Numerous techniques exist and a number of differ¬ 
ent ways exist to try to solve this problem. The methods 
can be classified into (i) manual, (ii) automatic or semi¬ 
automatic depending mainly on matches or features. In 
the (i) manual methods the registration is performed 
manually selecting correspondences between each im¬ 
age and the 3D geometry. This technique is often used 
for medical applications [13]. Others instead, have used 
features in order to automate the process, but finding 
consistent correspondences is a very complex problem. 
Due to the different appearance of photographs and 
geometric models, (ii) automatic methods are limited 
to some specific models and information. For example, 
line features are mostly used for urban environments 
and silhouette information is used when the 
contour of the objects is visible in the images and the 
3D model projected onto an image plane H15I16I17I . 

Nevertheless there are 3D scanners which provide 
also reflectance images and the registration is performed 
in a 2D space ll8l , H9l . On the other hand, some au¬ 
thors perform their registration in a 3D space recon¬ 
structing the 3D object from the 2D images and aligning 
both 3D objects 0,0,1201. This procedure is carried out 
in two steps: (1) 3D reconstruction and (2) point cloud 
alignment. Through the widely used Structure from Mo¬ 
tion technique (SfM), a 3D reconstruction and intrinsic 
and extrinsic camera parameters are recovered without 
making any assumptions about the images in the scene. 
The registration is usually performed by selecting cor¬ 
respondences 0 that minimize the distances between a 
set of points. 

Our work is based on SfM approach and the use of 
the SICP algorithm |22 to register both point clouds 
with the only constraint being to locate them relatively 
close to each other. 

2.2 Color Projection 

Once the images are registered onto the 3D model, the 
next step is to exploit the photo-realistic information 
(color, texture) obtained by an optical sensor, together 
with the geometric details (dense point cloud) obtained 
by some type of 3D scanner (laser scanner, structured 
light). The aim is to construct a virtual realistic repre¬ 
sentation of the object. 

As a point in the 3D model projects onto different 
images and images may possess some artifacts (high¬ 
lights, shadows, aberrations) or small calibration errors, 
the selection of the correct color for each point is a criti¬ 
cal problem. In order to deal with this task, research has 
been based on different solutions, each one with its own 
pros and cons. 


Orthogonal View In 1161, 1221 , the authors assign the 
best image to each portion of the geometry. This assign¬ 
ment relies on the angle between the viewing ray and 
the surface normal. As the color of a group of 3D points 
comes from one image, seams are produced when adja¬ 
cent groups are mapped with different images and also 
artifacts such as differences in brightness and specular- 
ities are visible. Even though some research has dealt 
with the seams by smoothing the transitions Il6l . im¬ 
portant and critical detail can be lost. 


Weighting Scheme In these kind of approaches 
I4l,l9l, l23l , an specific weight is assigned to each im¬ 
age or to each pixel in the images according to differ¬ 
ent quality metrics. The metrics vary between authors 
considering visible points, borders or silhouettes, depth 
m ,[231 or the distance to the edge of the scan 0. All 
these methods, in comparison with orthogonal view, are 
able to eliminate the artifacts previously mentioned but 
instead introduce blurring/ghosting when the calibra¬ 
tion of the images is not sufficiently accurate. 

Illumination Estimation Alternatively, approaches 
such as f24l attempt to make an estimation of the light¬ 
ing environment. This approach is able to remove possi¬ 
ble illumination artifacts presented in the images (shad¬ 
ows/highlights). Unfortunately, in real scenarios it is 
difficult to accurately recover the position and contribu¬ 
tion of all the light sources in the scene. 

Due to the evaluation criteria used and advantages 
provided by all of these approaches, a weighting proce¬ 
dure was selected as the best option for our work. We 
used the approach by Callieri et al. If23l because of its ro¬ 
bustness, availability and the good results obtained with 
it from our data set. 

2.3 Registration Refinement 

Since the data comes from 2 different devices and the 
geometry and camera calibration is imprecise after the 
2D/3D registration; blurring or ghosting artifacts ap¬ 
pear once the color projection is performed. In order to 
remove them, a global or local refinement is necessary. 

Global Refinement Some approaches try to correct the 
small inaccuracies in a global manner 1718191,1 16! by 
computing a new registration of the camera parameters 
according to the dense 3D model obtained with the scan¬ 
ner. The goal is to distribute the alignment error among 
all the images to minimize the inaccuracies and improve 
the quality of the final color of the model. Unfortunately 
as the registration is mostly based on features, an ex¬ 
act alignment will not be possible due to image distor¬ 
tions or low geometric features. Nevertheless even if the 
global alignment refinement finds the best approximate 
solution, the matches will not be exactly the same. As 
a consequence blurry details (ghosting effects) will ap¬ 
pear after the color projection fit)], especially when the 
color details are in areas with low geometric features. 
The only straightforward solution to correct these small 
inaccuracies, is to perform a local refinement. 
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Local Refinement A number of studies have been car¬ 
ried out based on local refinements which locally ana¬ 
lyze the characteristics of each point and try to find the 
best correspondence in the image series I10I11H21 . Find¬ 
ing these correspondences locally has been addressed in 
the literature by using optical flow. Some studies have 
computed dense optical flow QOlrlt^D but the results de¬ 
pend on the image resolution and the amount of mis¬ 
match (displacement) together with the computational 
power available. On the other hand, others instead of 
working in the image space, have tried to optimize the 
3D geometry in order to deform textures more effec¬ 
tively fkZl . As our 3D geometry cannot be modified 
these kind of approaches are not feasible for our pur¬ 
pose. 

Computing dense optical flow in our datasets was 
impossible due to relatively high resolution of the im¬ 
ages, e.g. 4008x5344 pixels compared to the 1024x768 
pixels used in the literature in IfTTI . For this reason we 
decided to use sparse optical flow to compute the color 
for each point in the 3D geometry limiting the number 


of images to the best three, evaluated according to the 
quality metrics of Callieri et al. [23||. 

3 Data Fusion 

Our goal is to fuse the information provided by the two 
different devices (3D scanner and 2D camera) in order 
to recreate a high resolution realistic digital visualiza¬ 
tion with both very accurate geometric and visual detail. 
The procedure to achieve this result needs to take into 
account various problems which will be solved in essen¬ 
tially four main stages (see figure [lj: (1) Image prepro¬ 
cessing, (2) Camera calibration through Structure from 
Motion, (3) Cloud registration to align the images to the 
geometry, and (4) Color projection which involves the 
most correct images to project the color onto the 3D ge¬ 
ometry. The whole process is designed to consider as in¬ 
put a set of uncalibrated images and a dense 3D point 
cloud or a 3D triangulated mesh. By uncalibrated im¬ 
ages we refer to images in which the intrinsic and ex¬ 
trinsic camera parameters are unknown. 



Fig* 1. General overview of the pipeline. 


3.1 Stage 0: Image Preprocessing 

Even though a number of studies have used a set of un¬ 
calibrated images to perform camera calibration and 3D 
reconstruction through some Structure from Motion al¬ 
gorithm 0,ELEOJ,L2ZJ, very few have considered a pre¬ 
processing step m 


This stage is performed in order to improve the cam¬ 
era calibration procedure (Stage 1) and consequently ob¬ 
tain more accurate camera parameters together with a 
better 3D representation. 

Three preprocessing steps were considered. The first 
two had already been applied by the C2RMF to their 
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data sets and the introduction of a third preprocessing 
step also enabled an improvement for the next stage. 

1. Color calibration. Performed to accurately record 
the colorimetric appearance of the object in the set 
of color images and to eliminate mis-matches caused 
by varying lighting conditions. In order to calibrate, 
a color chart is used during the image acquisition to 
determine a color transformation between the cap¬ 
tured values and the reference color target. 

2. Background subtraction. As a SfM procedure is 
based on feature matching, features will be de¬ 
tected in the foreground as well as in the back¬ 
ground. In order to avoid the reconstruction of un¬ 
wanted points (outliers) and have a clean 3D object, 
the background was removed manually. There are 
many segmentation techniques available in the lit¬ 
erature l25l but in order to be precise the manual 
method was considered by the C2RMF. 

3. Image enhancement. Through histogram equaliza¬ 
tion, we enhance the image contrast in order to find 
a larger number of features and generate more 3D 
points in the next stage. The original principle ap¬ 
plies to gray-scale images, but we used it in color, 
changing from RGB to the HSV color space and 
equalizing the Value (V) channel in order to avoid 
hue and saturation changes |26|. This step is very 
useful especially when the object lacks texture de¬ 
tails. The same idea was exploited in (27l with a Wal¬ 
lis filter. 

3.2 Stage 1. Camera Calibration and 3D 
Reconstruction 

The second stage of our pipeline consists of a self¬ 
calibration procedure. It is assumed that the same cam¬ 
era, which is unknown, is used throughout the sequence 
and that the intrinsic camera parameters are constant. 
The task consists of (i) detecting feature points in each 
image, (ii) matching feature points between image pairs, 
and (iii) running an iterative robust SfM algorithm to re¬ 
cover the camera parameters and a 3D structure of the 
object. 

For each image, SIFT keypoints are detected Il28l 
to find the corresponding matches using approximate 
nearest neighbors (ANN) kd-tree package from Arya 
et al. [29] and the RANSAC algorithm [30] to remove 
outliers. Then a Structure from Motion (SfM) algorithm 
mm is used to reconstruct a sparse 3D geometry of 
the object and obtain the intrinsic (i.e. focal length, prin¬ 
cipal point and distortion coefficients) and extrinsic (i.e. 
rotation and translation) camera parameters. 

In order to achieve a more geometrically complete 
surface of the 3D object. Clustering Views from Multi¬ 
view Stereo (CMVS) 1551 and Patch-based Multi-view 
Stereo (PMVS) [34] tools are used. This aims to increase 
the density of the 3D geometry and be able to obtain 
a more precise parameter estimation during the cloud 
registration (stage 2). 

3.3 Stage 2. Cloud Registration 

After the 3D geometry obtained with the SfM algorithm 
and from the 3D scanner, a 3D-3D registration process 


is performed. As both points clouds possess different 
scales and reference frames, we will need to find the 
affine transformation that determines the scale (s), rota¬ 
tion (r) and translation (t) parameters which aligns bet¬ 
ter both 3D geometries. 

Usually a 3D-3D registration refers to the alignment 
between multiple point clouds scanned with the same 
device. Algorithms like Iterative Closest Point (ICP) 1551 
and 4 Point Congruent Set 1561 evaluate the similarity 
and minimize the distance between the 3D point clouds 
considering only the rotation and translation parame¬ 
ters. But when a scale factor is involved it can be solve 
separately or together from the registration procedure. 

Calculating a bounding box for both 3D geometries 
and applying the ratio found between them seems to 
solve the scale problem, but if some outliers are present 
in one of the geometries the result will not be correct. 
Therefore Zhu et al. m, extended the Iterative Clos¬ 
est Point algorithm to consider also the scale transfor¬ 
mation (SICP), introducing a bidirectional distance mea¬ 
surement into the least squared problem. This algorithm 
works as follows: (i) define a target (fixed) and source 
(transforming) point clouds, which will be the scanned 
and reconstructed point clouds respectively in order to 
bring the camera parameters from the image space to 
the real object space; and (ii) perform iteratively the dis¬ 
tance error minimization using the root mean square er¬ 
ror (RMSE), until the best solution is found. The output 
is a set of 3D points aligned to the object coordinate sys¬ 
tem (real scale) by means of a 3x3 rotational matrix, 3x1 
scale matrix and vector indicating the translation in X, Y 
and Z axis. 

3.4 Stage 3. Color Projection 

Color projection is the last and the core of our proposed 
pipeline. The aim is to project accurate color information 
onto the dense 3D model to create a continuous visual 
representation from the photographic image set. 

Selecting the best color is not an easy task; first be¬ 
cause a single point in the 3D geometry is visible in mul¬ 
tiple images and those may present differences in illumi¬ 
nation. Secondly small errors in the camera parameters 
cause small misalignments between the images, and in 
consequence, a point which projects onto a specific are 
in one image plane will project onto a slightly different 
area in another one. This can result in different colors for 
each 3D point projected from several 2D images. 

In order to address this problem some research based 
on the color selection from the most orthogonal image 
for a certain part of the 3D geometry [I6],l22] generat¬ 
ing artifacts like highlights, shadows and visible seams. 

Others project all images onto the 3D mesh and as¬ 
sign some weight to each image, as, for example, in 
I4l,l9l, l23l , which can remove artifacts that the orthog¬ 
onal view is not capable of removing, but this can pro¬ 
duce some ghosting artifacts when the alignment is not 
perfect. 

In order to deal with less artifacts, we consider the 
approach based on Callieri et al. 123), which weights all 
the pixels in the images according to geometric, topolog¬ 
ical and colorimetric criteria. 
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The general procedure to perform the color projec¬ 
tion in our pipeline takes into account two steps: (1) a 
color selection considering weights assigned according 
to the quality of each pixel, and (2) a local error correc¬ 
tion in the image space in order to produce sharp results. 

Weighted Blending Function Through the approach by 
Callieri et al. (23l it is possible to compute the weights 


for each 3D point in the number of images they are vis¬ 
ible. The three metrics are based on: angle, depth and 
borders. For each of these metrics, a mask is created 
which has the same resolution as the original image. 
The aim is, therefore, to create a unique mask combin¬ 
ing the three masks through multiplication. The result is 
a weighting mask for each image that represents a per- 
pixel quality (see an example in figure [2} . 



m 

%4 

^ * 


Fig. 2. Example of weighting masks [231. From left to right: Angle mask. Depth mask. Border mask. Right-most, the combination 
of the previous three masks. For illustration purposes the contrast for the depth and border mask have been increased. 


Once the weights are defined for each image, and 
knowing the camera parameters obtained in Stage 1, it 
is possible to perform a perspective projection from the 
3D world onto an image plane. This projection allows us 
to know the color information for each 3D point in the 
geometry. The final color for each point is a weighted 
mean obtained by multiplying the RGB values from the 
pixels with their respective weights. 


The simplest way to correct these inaccuracies which 
generate the blurring artifacts, consists of finding for 
each 3D point, the local displacement in the best 3 im¬ 
age planes where it is visible. This local error estimation 
algorithm, based on IfTOl , is performed through a tem¬ 
plate matching algorithm shown in figure [4] 


The size of the block and template was defined ac¬ 
cording to some experimental evaluation performed on 
the highest resolution dataset (FZ36147). Normally if the 
cloud registration step in Stage 2 is accurate enough, 
the different 3D points projected in the image plane will 
not be so far from each other. For this reason the same 
parameters can be applied to lower resolution images, 
but they cannot be considered for even higher ones. The 
most straightforward solution is to tune the parameters 
depending on the image resolution and the 3D cloud 
registration output. 

The matching procedure is done on a pixel-by¬ 
pixel basis in a Luminance-Chrominance color space. 


Local Error Correction The results obtained projecting 
the color information with the quality metrics of Cal¬ 
lieri et al. [231 into the 3D geometry, generated blur¬ 
ring/ ghosting effects in some parts of the mesh. These 
problems appear due to small inaccuracies introduced 
in the image-to-geometry registration Hoi . 

Research such as that by 1101111121 ,1371 have consid¬ 
ered these kind of artifacts; but their origin, is explained 
by Dellepiane et al. [10J in figure |3j 

The reason for considering only the best three im¬ 
ages for each point, instead of all where it is visible, is 
to speed up the process in the final color calculation. In¬ 
stead of computing (n-l)p evaluations, we reduce them 
to (3-l)p where n is the number of images and p the 
points. Dellepiane et al. ItlQl affirmed that three images 
are enough to correct illumination artifacts. 

The conversion of the RGB values into YCbCr color 
space was performed directly with the built-in Mat- 
lab function 'rbg2ycbcr' and the similarity measurement 
mean square error (MSE) was defined considering also 
changes in brightness for each block by subtracting the 
average value in each channel. Through this subtraction 
we account for big changes in illumination between im¬ 
ages. The notation is the following: 


N-1N-1 

MSE = ((«« -s)- ( T «- ? )) 2 (D 

2 = 0 j = 0 






6 


Lecture Notes in Computer Science - Computational Color Imaging 



Fig. 3. Graphic representation of the local displacement defined by Dellepiane et al. (10| where p Q is the original point located in 
the scanned surface geometry; <j>i(p 0 ) represents the projection from the 3D world int a 2D image plane; ipi,j(pi) is the relation 
between corresponding points on different images; Aij ( pi ) is the necessary displacement required to find the correct matches; 
and Wij ( pi ) is the warping function necessary to find the correspondent point in the second image plane. 


where N is the total number of pixels in each block, S 
is the block in the source/reference image, T is the block 
inside the template of the target image, S and T are the 
mean values of their respective channels. At the end the 
error with the minimum value is considered as the best 
match. 

„ MSE y + MSEcb + MSE Cr 

Error = - (2) 

3 v ' 

In the case where there is more than one block match¬ 
ing the same criterion, a comparison of the colors from 
the center points will decide which block in the template 
is the closest to the block from the reference image. 

When the three RGB color values are found for each 
point, we proceed with the multiplication of them with 
their respective weights to average the results and as¬ 
sign final color values to each point in the 3D geometry. 

4 Experimental Results 

In this section we present experiments performed on 
real data from the C2RMF with scans from objects from 
the Department of Roman and Greek Antiquities at the 
Louvre museum in order to assess the performance of 
the proposed pipeline. The data had been captured at 
different times using different equipment. Each object 


had data from a structured light scanner and a set of 
color images used for photogrammetry. 

The 2 data sets (FZ36147 and FZ36152) contain in¬ 
formation with different qualities and sizes and a small 
description of the datasets used, is listed below together 
with a brief explanation of the criteria used for a visual 
quality evaluation. 

- Dataset FZ36147. This Greek vase, an Oenocho from 
around 610BC, contains 1,926,625 points (pts) and 35 
high resolution images (4008x5344 pixels).The im¬ 
ages were acquired under an uncalibrated setup, but 
our method was capable to remove the lighting ar¬ 
tifacts and preserve details in its decorations. For 
the final evaluation of our proposed local error es¬ 
timation algorithm implemented as part of the color 
projection procedure (stage 3), three small patches 
selected manually from the 3D geometry were ex¬ 
tracted. Each patch was carefully selected according 
to visual details where mis-registration of the cam¬ 
era parameters led to blurring artifacts. 

- Dataset FZ36152. This Greek vase, an Oenocho from 
between 600-625BC, represented by a 3D model 
which contains 1,637,375 points and 17 images of 
resolution 2152x3232 pixels. With this dataset, the 
registration in the second stage of our pipeline, is 
not accurate enough to avoid blurring effects which 
appear in the whole 3D geometry. The local error 
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1. Compute 
weights for all 
the points in all 
the images 
according to [23] 


2. Sort the weights 
in descending order 
and take only the 
best 3 together with 
their respective 
images 


Process for each point 



3. Project the point in its 
best image plane 
* (reference plane) and 
save the colors 


6. Find the best 
match and save the 
colors 


t 


5. Create a template of 
35x35 px with center in 
image coordinates 
> found with the next best 
camera parameters 
(maximum displacement 
of lOpx in all directions) 


t 


4. Generate a block 
of size 15x15 px from 
the reference plane 


7. Multiply each RGB 
colors with their 
► respective weights 
and update table 


i 


8. Weighted mean 
of the 3 colors 


i 


9. Project points 
with color into the 
mesh 



Fig. 4. Illustration of the local error estimation procedure. 


correction in our method, brings sharp results in the 
three patches extracted in the same way as in the 
previous dataset. 

Due to the fact that the images of the datasets are 
uncalibrated (no ground truth is available) only qualita¬ 
tive, meaning visually, evaluations were performed, as 
found also in the state of the art Col, EDI. 

The reason for performing the evaluation only in 
small patches, refers to the high density of points each 
3D geometry contains and the programming language 
used for the implementation (CPU programming). 

The time required to perform the local error estima¬ 
tions, depends on the amount of 3D points the geometry 
has, and on the displacement found for every projected 

A visible comparison with the state of the art IflOl 
is presented in figure [6] Dellepiane et al. also evaluated 
their method with one dataset from the Louvre museum 
with different resolution characteristics. The implemen¬ 
tation in CD1 is based on dense optical flow and GPU 
programming for which really high resolution images 
are a disadvantage. The maximum resolution tested by 
Dellepiane et al. IflOl was 3000x1996 (5,988,000 pixels) 
which took around 5 hours in 6 images. In our dataset 
FZ36147, its resolution is 4008x5344 and contains 35 im¬ 
ages, the pixels needed to be evaluated with CD1 will be 
21,418,152 which is at least 5.57 times more than in their 
dataset with maximum resolution, and 6 times the num¬ 
ber of images. Only with extremely powerful process¬ 
ing capable of handling such computations can their ap¬ 
proach be applied, otherwise their method is not a fea¬ 
sible solution with our data set. 


Our algorithm, implemented in stage 3, corrects the 
small errors in the projected image planes, converging to 
good results regardless the initial rough alignment ob¬ 
tained during stage 2. Figure 5] shows the results of the 
color projection once the small errors are corrected. The 
quality of the appearance in the new projections (down 
red squares) is improved, removing the unpleasant blur¬ 
ring artifacts. Table [l] shows a summary of the charac¬ 
teristics of the datasets used, together with the patches 
evaluated and their corresponding computational time. 


point in the 2D image plane. If the density increases, the 
computational time will be higher. 


In general the state of the art methods (TTJ, (10| are 
based on dense optical flow which is the main reason 
there is no possible comparison with our datasets. 

Even though our implementation has proven to be 
robust and reliable, some limitations still remain. The 
main one relates to the programming language for 
the acceleration of the computational time (from CPU 
to GPU programming). Also, in the evaluations per¬ 
formed, the maximum local displacement found was 
not large (10 pixels); but for other cases (e.g. images with 
higher resolution), the displacement can be bigger and 
in consequence the parameters for the template match¬ 
ing algorithm in Stage 3, have to be adjusted. 

There are also some limitations related to lighting ar¬ 
tifacts. Factors like highlights/shadows may complicate 
the estimation of the local error displacement, and inclu¬ 
sive mislead the motion to totally wrong values. Never- 
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Fig. 5. Final color projection in datasets from left to right FZ36147 and FZ36152. In the first row some of the original images are 
illustrated; second to fourth represents the 3 patches used for the direct color projection with Callieri et al. approach (23l and 
the local error correction results for each of them (down red squares). 
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Dataset 

3D model size 

N. of images 
(Resolution) 

Patch 

S. Patch 

Computational 

Time 

FZ36147 

1,926,625 pts 

35 (4008x5344) 

Up 

Middle 

Down 

4049 pts 
4834 pts 
3956 pts 

2 hrs 30 min 

3 hrs 3 min 

6 hrs 40 min 

FZ36152 

1,637,375 pts 

17(2152x3232) 

Up 

Middle 

Down 

4750 pts 
4903 pts 
6208 pts 

3 hrs 10 min 

2 hrs 46 min 

3 hrs 8 min 


Table 1. Overview of tests performed with our Local error estimation algorithm. 


theless , these drawbacks are shared with every method 
based on optical flow calculations. 


5 Conclusion 

We have proposed a semi-automatic 2D-3D registration 
pipeline capable to provide extremely accurate realistic 
results from a set of 2D uncalibrated images and a 3D 
object acquired through laser scanning. 

The main advantage of our pipeline is the generality, 
since no assumption is made about the geometric char¬ 
acteristics or shape of the object. Our pipeline is capable 
of handling registration with any kind of object, since 
the algorithm used is a brute force (SICP) which evalu¬ 
ates every single point and finds the best position. The 
only requirements needed are a set of 2D images con¬ 
taining sufficient overlapping information to be able to 
use the Structure from Motion (SfM) technique in stage 
1; and a user intervention during stage 2 to locate the 
dense point cloud, coming from the scanner, closer to 
the one obtained by SfM, in order to provide the input 
that the Scale Iterative Closest Point (SICP) algorithm 
needs to converge. This user intervention during the sec¬ 
ond stage in our pipeline is what makes our approach 
semi-automatic. 

In conclusion, our main contribution is the local er¬ 
ror correction algorithm in stage 3 which proved to be: 

1. Robust: it works with low and high resolution im¬ 
ages, as it considers the interest points (projected 3D 
points into the image plane) for the matching. Not 
even the state of the art (101,11111 is capable of deal¬ 
ing with as high resolution images as our algorithm. 

2. Accurate: it finds the best possible matching for the 
small error displacements considering luminance 
and chrominance channels. Through the best match, 
it removes the unpleasant blurring artifacts and pro¬ 
duces sharp results. 

3. Photo-realistic: with the point cloud generated by 
SFM ED,[32 and the registration algorithm SICP 
EL the color information from the 2D images is 
projected onto the 3D object transferring the appear¬ 
ance. 

An interesting direction for future research would be 
to define a criterion with a respective threshold to iden¬ 
tify the possible borders where the sharp results start to 
blur (in the cases where only some parts of the 3D object 
are visible with ghosting effects). This identification has 


to be based on the depth difference between the 2 regis¬ 
tered point clouds, and probably a segmentation accord¬ 
ing to depth may help to optimized our proposed local 
error estimation algorithm. 
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